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• A convex two-st^e sarflple selection process was used in designing the National 
Longitudinal Study (NLS) of the High School Class of 1972. The firsVstage sampling frame' 
usfed in the selectior^ of schoolsr was stratified by the following seven variables:* ^ 

• Type of'cdntrol (pulplic or privatfe) 

• Geographic region (Northeast, North Central, South, arid West) , - 
•"Grade-12r'enrollment (less than 300,^300 to 599, and 600 or more) 

• Proximity to institutions of higher learning (3 categories) 

• Percent minority group enrollment (8 categories, public schbols pnly) 

•*^fncom.e levfef of the community (11 categories, public schools; 8 categories. 
Catholic sch(K)ls) 

• Degree of urbanization (10 categories) 

' Both, priority considerations and judgment were used in consolidating the various 
dasses to prpcluce the 600 final stratjffrom which a sample of J,200 schools was chosen. 
The seconcf stage of the sample selection involved choosmg a simple random sam^ple of 18 
seniors per high school: This report considers the effects of stratificat4on, oversampling^of . 
sctiools by percent minorityfgroup enrollment and iricomejevel of the community, cluster- 
ing of ^udents within a school, amd unequal weighting on the variances of the resulting 
.statistics and hence the precision of the sample statistics. 4 

The results suggest that the School stratification variables reduced the variances of 
national estimates by 20 percent below what Would have been expected with gnstr^tified 
cluster sampling. Variances of subpopulation were reduced by lesser amounts, from* 6 to 20 
percent, depending upon the subpopulation.- Clustering the sample of students increased 
variances of national estimates -by an estimated 83.5 percent over simple rando/n sampling 
with smaller Increases for various subgroups. In general, the increase in variance due to 
--^luster sampling is .only partly^^sfet by *the reduction due to stratification. 

Of the five , major .stratification variables, SES ^socioeconomic status), s\ze of school, 
tyjpe of control, geographic region, and proximity to college or university, region rs perhaps ' 
the strongest; type of control is the weakest; and \he other' three \\e somewhere between. 

The final section of the report describes a limited ^nd approximate analysis to secure 
rough indications of the effects of unequal weightings due to oversampling, nonresponse 
adjustments, unequal stratuni sizes, and imprecise school size measures. 

This study was conducted by R.P. MoOre and B.V^ Shah, of Research Triangle In- 
^stltute, under contract with the U.S. Department of Hedlth, Education, and Welfare^for th6 
National ,Center for. Ectucation Statistics. 
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I. INTRODUCTION 



♦The efficiency of the .1972 Nafiooal Long- 
itudinal Study (NLS) sample design for a 
base-year survey was analyzed previously 
using variance component estimates and 
estimated efficiencies [1]. In this report, 
average design effects for statistics esti- 
mated from the base-year data are pre- 
sented. Attempts to- paHition the design 
effect into effects due to stratification, clus- 
tering, and unequal weighting are dis-' 
cussed. The Expected increase in subpopu- 



lation sample stees due to oversamplinois 
calculated and pompargd with th6 actuanrT- 
creases observed 1n the base- year survey. 
The effects j^n varianc€» of oversampling 
and other factors which lead to unequal 
weighting are approximated and the op- 
timum oversampling rates for several sub-, 
populations are estimated. Several of. the 
Stratification variables are ranked .from 
most effective to least effective in reducing 
the variances of survey estimates. 



NOTE -Reference indicated in brackets are list*! on page 22. 
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II. .PARTh"IONING THE DESIGN EFFECT 
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.A., Estimated Design Effepts 

The design effect [2] or '^Deff*', del^ined 
as the ratio of the actual variance of alsu^ 
vey' estimate to the variance for a simple 
random sample of the same size, is useful 
in evaluating a ;Sanfiple design. The'Deff 
measures the combined effects of cluster- 
ing, stratification, and unequal weighting 
on the variances of survey estimates. 

Variance component estimates computed 
^ for.357 statisticsjn the study of NLS desigin ^ 
efficiency were used to calculate estimatea • 
design effects. For each statistic, Ttie com-\ 
ponents estirrtated were: 

1 ■ ^ 

o^. = variation among final strata, 

2 variation among schools within 
^1 ^inal strata, and 

2 _ variation .'among students within 
^2 schoots. 

the variance component estinnates were 
used to model the;^ari^nc'e of-^ch statistic 
with the NLS design. 



(1) 



where 



n-| = the number of s^^ple schools, and 

r\2 = the number of wmple students per 
school. 



The approximate variance of each statistic 
for a simpje random sample of ^\^2 
students was calculated as 



^1 "2 



(2) 



Thert the design effect, D, for each statistic 
was /estimated as 



D = 



(3) 



using n-| = 1,043 and = 17, ^the ap- 
proximate numbers of. responding sample 
schools and students per school in the NLS 
base-year' survey. — ^ 
. Table 1 shows the average values of the ji^ 
design effects and roo't design effects cal- 
culated, by type of statistic. We note that 
the estimated design effects tend to be 
. largest for national means and tend to vary 
with the average cluster size (nunr^er- of 
respondents per school) for . sut^jj^joup . 
means. The design effects for subgroup, or 
domain, means tend to be larger than thojg^ 
for th0 differences between subgroup afno 
nationaUmeans. ' ^" . 
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Table 1.— Averags number of respondents and average e^imatad design effects 



\ 



Type of statistic. 



' NufTiber Of 
respondents 
per school 



Number 

, ' of . ; 

statistics' 



Design'' 
effect 
D . 



Square root 
of 

design effect 



National means 


15.363 


21 • 


.1.463 


• 1.203 


Subgroup means 








/ 


White • 


*'11.711 


42 


1 .327 ' 


■ 1.147 


Femaies ' - 


7.690 


,42 


1.213 


1.097, • 


Maies 


^ ceo 

7.552 


42 


1 .173 


1.081 


Father higti school graduate 


■B.399 ^ 


42 


•■ 1.156 ' 


• ' 1 .074 ■ 


Father less than high school. 


4.651 


42 


V.117 


1.056 


J Father college graduate 
Black 


2.440 - 


42 


1.119 


■ 


1.888 


42 


1.219 . 




Other races* 


1.465 


. 42 


. 1.182 \ 


1.C85 


^ \ 










All domain means - 

* 


5.475 


168 


1.233 


'.4.- 


Differences of ^omam and 










national mean§ 


5.475 


168 


1.143 


1.067 \ 


All statistics 

r^^ ^ 


f 

6.056 


' 357 


1^4 


1 .094 \ 



Assumes n-j 



17 



The root 'design eftects computed usinfl 
' variance componen} estimates {Xa\^ 1) are 
10 to, 15 percent higher than ^rhp^rdble 
'Ones tabulated by William B. litters [3] 
using the conventional betweenrPSU-withir)- 
stratum variance summed over strata. This 
■ is not surprising recalling that the variance 
component estimates are thought to be 
^gDverestimates [1] and realizing that equa- 
♦ tion 3_ may be rewritten as . 



whtfre 



4 



c/w 



ml 



0 ^ 1 



4 ' 



2 2 2 

0.1 * 2 



. (7) 



D = 



"2 '^l 



'rs/w 

• \ 



2. 



0-1 ^2 



'(4) 



\ 2 

From the above, we see that If and /or 

?1^®''e*overestimated to a greater extent 
than the remaining components, then 0 
would be overestimated. ' . 

B. Effects of Stratification and Clustering 

We can iilso tjse the variance compone.nt 
estimates to approximate the effect of clus- 
tering the sample of students by schopi and 
the effect of stratifying schools. The effects 
on the variances of survey estimates are of 
interest in studying the efficiency of the 
sample Oesign. Recalling equation 3, D ^ 
L-|2/5;32^ tha estimated design effect may 
bQ rewritten as 



The firsts term of equations 5 and 6 repre- 
sents theWfeqt of clusterirtg the sample df 
students by school altendec^ Jnere Is 
the intraschool .cluster coi;nBlation for an 
unstratifiea selection of schools and slu* 
dentS; given. the ynequaf weighting of the 
NLS base year sample design.- The last 
'term in equations 5" and 6 represents the* 
reduction inUhe variances of survey esti- 
mates obtained from school stratification, 
where5rg/yv i$ the intrastratum cluster cor- 
relation for a,Kandom selection of students 
from an unstra\ffied' frame. ^ * ^ 

If vye introdube 1 2^, the valance of a 
survey 'estimate \for an unstraWfled cluster^ 
sample, 



2 \ 2 



2. 



2 Oil 
i = \ + 



(9) 



D = 1 + (n2-- 1) 



or 



\ 



D= .C^- n 



'c/w 



rs/w 



(5) then WB Sh»-write 



2 *rs/w 



(6) 



'rw - 

^3 



J 



= 1 + (rta - 1). «c/w 1 



; 10. 



as before. Using equation 9^ we can also 
write the effect of stratification, S^,^, in a 
multlpjicative model as 



* 2/ 

• ' • 1 

S = — 
cw 



S 2 
2 



•Crw — "2 *rs/w 



'rw 



(10) 



Now we can write 



2 2 
2 1 



= ^rw 'HD 



and the design effect has been partitioned 
into C|^, tlie effect of clustering, and S^.^^, 
th9 effect of stratifidatton. _ « * 

Table«2 shows the average values of de- 
sigh factors^ calculated Jor the NLS sample 
design , using .the variance component esti- 
mates described earlier: (the S^.^^ values 
shown were derived from the average C^w 
and D values. ^Clustering the sample of 



'rs/w *c/w 



students increased variances of national es- 
^timates" by an estima^ 83.5 percent over 
simple random sampling. Stratifying the 
clusters using the NLS school stratification 
scheme reduced tfie variances of national 
estirhates by an estimated 20. percent- or 

\ 100(1 ^ Scyi^), on the average, below wtfat 

'•they would have .been with unstratified 
cluster sampling. Both effects are reduced 

• for subgroups and there appears 'to be a . 
tendency for both effects to approach 1.0 as 
the subpopulation size gets smalL In gen-^ 
eral, the increase in variance due tQ cluster 

. sampling is only par,tly offset b^ the reduc-, 
tion due to stPatifjcation. Table 3 shows 
average values of the ratios gf variance 
components. 6, 
modeling. 

Having estimated the reduction in .vari- 
ance from the stratif^tion variables used 
in the NLS design, one also would like t6. 
compare the effectiveness of the individual 
stratiffcation vartables. Knowiedge of which 
variables were most effective in reduping 
the variance of survey estimates would be 
[ useful in designing future NLS sbmples^anck. 
.also in the^design of similar samples. The 
results of comparing the' individual ifratifi-' 
^tion variables are shown in section III of 
thi3 report.' 

Design effects (and variances vOf esti- 
mates) are also affected by ' the^ unequal 
weighting of^the individ'tjal elements of the 
sample. The effact of 'unequal weighting is 
diSG&ssed in tfie next sec^en. 



^ Table 2.- Average tH^^i cluttertng and stratification for the NLS desIgn^ 



Statistic , •: ' . jCrw ,'^ ^ n2^8/w 



i. 



—• — : — —• ■ — - '■-■.^ ,M V- • — — : ' — — ~ 

National means ' *~ 1.835 0.372 .797 

Subg|oup means . - • ■ : 

White . .1.655 .329 , 1.327 ,.'802 

Femaies ; 1.364 . .151. • * 1.213 .889. 

Maies • " 1.356 ' J83 1.173 .865 



1.J7 
1.f£ 



,r . . Father high schooi graduate 1.302 .146 1.156 ■ .888 

Father iess than high schooi 1.274' .157 1.117 • .671 

• father coilege graduate 1.188 '. • . '.069 1.119 .942 " • 

Biacl< ' • 1.^ .239' 1:219 ' .'836'- ^ 

' Other races. • ■ 1.311 .129 » 1.182' .902 j * 

% 

All ctomaiji means ' ^-^ '^^^ ^ 

bifferences of domaiii and • ^ - ^ *^ 

national means 1.296- .VSS ^ 1.142 ' . .882 

All statistics^ , 1.391 .W c 1.204 .857 



^Assumea 02 = 17. 



A* 
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Table S.-Avlri^ patios of .variance component estimates 



Domain 



rsyw 



c/w 



Number 

of ^ 
statrstics 



National means . \ > 


0.022 ' 


0.062 


21 ^ 


Subgroup means 
White 

Males 

^^ather high school graduate 
rdiMer less inan nig/i scnool 
Father college graduate 
DiacK 

-* Other races ^ ' . - 
All domain means 


.o!d 

.009 . 
.011 
.0Q9 
.009- ' 
.004 . 
.014 . 

.ooa • - 

.012 


,041 ' 
J i023 y^yflH 
-^02?"^ 
.019 

■ '.017, . ' • 
.012.' 
- • .029 

, .oia- 

.027 


"-42 . •-• - 
42 

• 42: , 

42. J. 
42 - 

.42 . i 
168 • 


Differences of domain and 
national means 7 


.009 


.019 


168 


All statistics 


.0,t1 


: .02'4 


357 


• 




H 

j 1 


V 



0 
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C: Effect of Unequal Weighting 

The variances of survey eslinriates are in- 
' creased wHen the sample elements (stu- 
. dents) have unequal ' weights. Unequ'al 
weig#5^ arise from ovprsampling certain 
subpopulations, from using imprecise size 
JPheasures to^seledt sam'ple schools, and 
from nonrespanse adjustments. ^ The esti- 
'nnated design effects presented m Ihe pre- 
vious section include the effect of unequaj, 
. weighting, as do the estimated -effects of 
stratification and/or clustering. That is, in 
the previous section the deSignreffect was , 
partitioned' mtp 

C'rw ^cw 

» ** 

whereas it would be* ppssible to partition 
. the design effect into 

' - . D = WSC. 

Folsorw [4] discusses the methodology which 
could bp used to estimate 'the effecf of un- 
- equal weighting and other finer partition- 
ings of ttie desi,gn effect (see equ^tiQn'62 in 
• reference 4). ^Unfortunately, cdh>^ting ^he 
alnalys^S 'described by Folsom-was beyond 
the scope of the' project #s it would« have, 
required the develqpment of several new 
computer programs, estimation of an .addi- 
tionarset of variance components, and-ad- 
ditional analysis time * . i ■ 

In order to obtain some information about 
the effects of unequal weighting in the NLS 
desiga, a more limited and approximate 
analysis was conducted. The analysis in- 
volved estimating the approximate effect, of 
unequal weighting on the variances of sur-^ 
vey estimates A [^tion of the unequal 
weighting is due tofoversampling a part^^of 
the population and the effect of this over- 
sampling is estimated. The remainder of * 
the iKiequal weighting, aside from over- 
sampling, IS caused by nonresponse adjust- 
ments, unequal stratum sizes, and impre- 
cise school. size measures. Estimates of the 
cohnbined effect of these factors were also 
computed. The readef should be cautioned 



that the analyses presented here are based 
on oversimplifications and far-reaching as- 
sumptions , and the results should be re- 
garded as rough indications of the effect? 
ratlter than precise estimates. 

. 1 . Extent of Oversampling 

The school sampling frame for the 1972 
NLS was divided into two socioieconomic 
(SES) strata. The low SES stratum (type A 
schools) was formed by grouping schools 
with^gh percentages of minority students 
and/or schools loc^d in low income areas. 
The higta SES stratMi (type B schools)' con-/ 
sist^^ of all other "school? in the samplirrg 
frame. Students from the low SES stratum 
were sampled at approximately twice the 
sampling rate used in the high SES stratum, 
in order* to increase the number of sample 
students who belonged to critical subpopu- 
latipns— the minorities, the poor, and -the 
poorly educated. (Additional details sire 
given in the Westat report [5] on the 
s&n>ple 'design.) 

Data needed to'complete this analysis in.- 
cluded sample counts and estimated sub- 
population sfzes for the low SES and high 
SES strata separately. These data are,^ 
shown in tables 4 and 5 for subpopuJations 
definpd by sex, race, and father's educa- 
tion. Also shown, for general interest, are* 
"adjusted"' estimates where the "not re- 
ported" \stjmates and sample sizes "were 
proportionally added to the remaining cate- 
gories for each subpcJpulation-defining vari- 
able, the e^tirfvatep totals fdrr the low and 
high SES strata arte close to the estimated 
numbers of semors\983,240 and 2^064,647) 
used in designing the sample [5]|^consider- 
^g that the latter were estimates based on 
enrollments in, earlier school years and that 
some of the schools in the sampling frame 
had closed by the time the survey was con- 
ducted ^ 

The "not reported" categories for fa- ' 
•ther's education include both students who * 
answered ^he question as "not applicable'' 
and those who left the question blank. The 
estimated subpopulation size estimates in- 
dicate, as might be expected, that students 



















• 




Table 4.— Ettlmattd tubpopulatlon tiztt for low and high SES adjusted for mitting tubpopuiatlon 
eiassiflar vfrlablat 






^ Subpopulation 


Low SES 
(tvoe Ai 




High SES / 
(tvoe B) 


Total 










Nurhber Percent 


Number 


Percent 


Number 


Percent 




'/ 


Unadjusted estimates 
Sex 








* 










^ Male 
Female 
Not reported 


426,902 
438,887 
72,509 


45 5 

46 8 

7 7 


927,1^14 
921,729 
166^^57 


46 0 
45' 7 
8 3 


1,354,046 
1,360,616 
238,767 


45 8 

46 1 
8 1 


• 


T 

V 


nace ^ , 

White 
Brack 
Other 
' Not reported 

Father's. education 


537,321 
197.227 
118,103 
85,648 


57 3 
21 0 
12 6 
9 1 


1,655,621 
55 398 

WW y W 

115,763 
188 348 


82 2 
2 7 
5 7 
9 3 


2,192,942 
252,624 
233,866 
273,997 


74 3 
'8.6 
7 9 
9 3 






Less than hiQh 
^ . school graduate 
High school graduate 
College graudate 
Not reported 


281 .679 
212.265 

1Q7 C\f\A 

247.291 


I 

30 0 
-22 6 

91 n 

£. 1 yj 

26 4 


428,640 
529.020 
705,395 
254,074 


f 

'21 2 
26 3 
35 0 
17 6 ^ 


708,319 
741 286 
902,459 
601.365 


24 0 ' 

25 1 
30 6 

^ 20 4- 






Total 


938,299 


100.0 


2,015,130 


*100.0 


2,953,429 , 


1.00.0 






Adjusted estimates ^ 


















Sex 


















Male 
-Female 


462.655 
475.543 


49-3 
50 7 


1,010,516 
1,004,614 


50 1 
49 9 


1,473.171 
1,480,257 


49 9" 

50 1 






*. Pace 


















White 
Black 
Other 


591 ,294 

oi 7 mo 
^,1 / ,UocS 

129:966 


63 0 

CO 1 

13 9 


1,826.322 
61,110 
127,699 


90.6 
3 0' 

6 y 


2,417.616 
278.148 
257,665 


81 9 
9 4 
8 7 






Father's education 


















Less than htah 

^wdW IMClll PIIUII 

school graduate 
High school graduate 
Colleg^raduate 


382,483 
288.228 
267,587 


40 7 
30 7 
28 5 


517,584 
641,787 
855,759 


25 7 
31 8 
42,5 


900,067 
930,015 
1,123,346 


30 5 

31 5 
38.0 


• 




* Tota^ . 


938,299 


lAo 


2,015.130 


100.0 


2,953,429 


100.0 


# 




Adjusted estimates computed by proportionately alloC&ting "not reported" estimate to other categories. 
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/Tabto S.— Subpfcpulatloft Mmpi^ tlsM f«r low and Mjih SES adjuttad for mlsalnfl Hrbpopulallon 
eiaa^lar v^aMaa / ^ 



ERIC 



Low SES ^ « tligh ,SES . Total " ^ 

po0ulatl9n ; (type A) ^ . (ty|;)e B) , ' 

Number Percent , Number Percent Number Percent 



Unadjusted counts: 
Sex: 



Male 
Female 
Not reported 


3.78ft ^ 

i 


;45.Z 

1 . 


4,289 
4,256 
i ^ 809 


45.9 
45:5 
8.6 


8,075 
8.202 ' 
1.449 


45.6 
46.3 
8.2 


Race: 










White " 
Black 
^ther, . 
Not reported 


, 4,7» ^ 
1,8(^7 

754 


^ 21 6 

'%j2:4 

' 9.0. ' 


. 7;652 

252 ^ 
542 
.908 


81 .a t 
2.7 
5.8 


V 12,427 

1.578 
1,662 


70.1 
11 B 
8.9 
9.4 


Father's educatioh: 

Less than high 

school graduate 
High school graduate 
College graduate - 
Net reported 


2,492 
1.883 ' 

2,227 


2!d.6^ 
22.5 
21.1 
i 26.6 


1*953 
2,420 
3,306 
' 1 ,675 


20.9 
25.9 
36 3 
. 17.9 


• 4,445^ 
4,303 
5,076 
3,902 


' 25.1 
24.3 
. 28.6 
22.0 










100.0 - 


f 

17 Ton 




Adjusted e^timates:^ 














Se'x: 














Male 
Female 

r 


4,099 
. 4.273 


49.0 
51.0 


-4,696 
4,659 


50.2' 
49.8 


" 8.796 
6,932 • 


49.6 
50.4 


' Race: 




« 










White 
Black 
Other 


5,246 
1.986 
1.139 


62.7^ 
23.7 
' ^13.6 


8,475 
279 

\. 600' 


' 90.6 
3.0 
6.4 


13,723 
2,265 
1,739 


" 77.4 
12.8 
9.6* 


Father's education: 






/ 

25.4 

31.5 
43.T, 






Less than high 
school graduate 
' High school graduate 
College graduate 


•V 3.395 
^2,565 
2,411 


30.6 
28.8 


2,379 
2,948 
4.027 ' 

> 


5,774 
S.5i3 
6.438 


32.6 

31.1 >s 
36.3 


Total 




^ 100.0 


9,364 


100.0 


17.726 


100.0 % 


Adjusted estimates computed by proportioij^ly allocating "not reported" sample size to other categories. ^ « 

• * 


ft 


ft 
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of minority races and students with poorly 
educated fathers make up Jarger p^cent- 
ages of the low SES stratum than they do of 
the high SES stratum. 

In table 5, it may be noted- that the over- 
all participation ta\e was 77.5.percent in the 
low SES stratum and 86.6 percent tn the 
high SES since the target sample size was 
10,800 students in each. The percentages of 
sample students who were black, otfter 
races, with poorly educated fathers, and 
with father's eoucation unknowm -would 
have been higher if both SES goDups had 
participated at the same rate. / 

Th^ amount of oversampling-achie>/ed for 
various subpopulatfons in the 1972 NLS 
base-year survby has been estimated by 
Fetters [6]. What is * perhaps less well- 
known is ^he amount of ovefsampling th^t 
.should have been expected, given the 



sample design and the dlftripuWcfn of the 
target populations within Ahe oyers^mpled 
and undersampled portiorts of, th'e univers^e.* 
Prior to using the data fi^om tab/es 4 and 5 
to estimate this, we wil/ ift^^'oouce^.the fol- 
lowing notation . which/ is essentially that 
used in the recent artidle by waksberg [7]. 

Let N-j and N2 be theuto^al /populations of 
stratum 1 and stratum/ 2/ respectively, 
where N2 = v N-j and v^' 

Let t-j and ^2 ^® t(^e / proportions of 
stratum 1 and stratum 2 /belonging to S 
specified subgroup. 

*Let r^ and r2 be the san^pling rates used 
in stratum 1 and stratum 2, respectively, 
where = k r2 and K^l. 

Now we c^n write the expected increase 
m subpoputation sample 9izes, due to over- 
sampling, as the'ratjo 



^1 ti 



'2 <2 



r(ti 

4r. 



\2 N2) 



U N 



1 + t2 N2 



+ t2.N2 



(12) 



where r = the uniform sampling rate for a 
proportional allocation whictp Will give the 
same expected sample size; that is, r(N^ + 
N2) = r^ + r2 N2. The numerator of 
equation 12 is the subpopulation sample 
size, expected with oversampling and the 
denominator is the subpopulation sample 
size expected with -no oversampling. T/ie 
estimates in table 4 werjsjjsed to calculate 
the^first twa columns of table 6, which are 
.estimated values ^iif - 



The sampling rates for the 1972 NLS were 
calculated from data in the Westat report 
[5] as • . 



^1 



= -10,800/983,240 = .010984, , 
=, 10,800/2,064,647 = .005231, and 
= 21,600/3,047,887 = .007087. 



Thus 



tl Ni 



h ^^ + *2 '^2 



— = 1.560 , 
r 



and 



X ■ 



.738 , and 



t2 Ng 



*,1 f^l .+ *2 '^2 



k = 



/2 



= 2.100 
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Using thesQ figures in equation 12, the ex- 
pected increi^ in ^ample sizes for various 
subpopulations were computed and are 
shdwn in the third column of {able 6, For 
comparison, the actual . oversampling 
achieved in the survey, calculated as the 
percent of sample cases belong to the sub- 
population (table 5) divided by the^ esti- 
mated percent of the population (table 4) 
belonging to the subp'lfpnilation, is shown in 



the last column of table 6. The actual over- 
sampling achieved agre^ ewjte closely V/Wh 
that which wowld "t3«r^xpected, given tjhe 
design. Not6 that to obtain much increase in 
the subpopulation sampfe si?e from over- 
sampling, .there must be a large proportion 
of the population in the stratum' which is 
sampled at ' the higher-than-proportional 
rate. 



Table 6.— Expected and actual effcibt of oversampling on subpopulation' sample sizes, 1972 
NLS base-year sorvey 



Subpopulation 



/Estimated' proportion of 
subpopulation men^bers in 



Low SES 
stratum 



High SES 
stratum 



Effect of oversampling 
on sample sizes 



Expected 



Actual 



Sex: 
Male 

Female \ 
* Not reported # 

Bace: 
White ; 
Black \ 
Other ' 
Not repo'rted 

Father's education: * 
Less than High school graduate 
Hlgh echool graduate 
College graduate"^ 
Not reported 



0.315 
.323 



0.685 
.677 
.696 



0. < 

1. ( 



.98 



1.00 
1.00 
1.01 





f 






.245 


.755 


.94 


.94 


781 


.219 ■ 


1.^7 


1.35 


:505 


.495 


1.15. 


t.13 


.313 . 


.687 

A 


■99 


1.01 


.398 


mi' 


- 1.06 


1.05 


.286 


.714 


. .97 


.97 


.218 


.782 


.92 


.93 


.4t1 


.589 


1.07 


1.08 



ERIC 



2. Effect 6f Oversampllng 

y YV^ksberg [7] gives a convenient formula 
for computing the approxirpate increase or 
decrease' |in .variances of * subpopulation 
means as 



k (1 V) (u + V) r 



(13) 



where 



timated effect 



of ovepsampling for each 
The variances were In- 



9ubpopulation. 
'creased by overs^^pling- in the NL& design 
for most subpopulations and' a moder^e re- 
duction was obtained only for blacks.' Vari- 
ances^ of estimates JOf, the t6tal population 
of students were increased by 13 pdrcept. 
Proportionar sampling is 'optimal for total 
()opulatioA<^l mates. The increase in vari; 
ance o^t|3ti mates for- the total population 
may also be \vritten as 



{he variance of an estimatM 
subpopulation. ^nrvean with* oyer- 
•sanripling, j * 



(k^- 1) 




fork •> 1. (14) 



• o the -variance of an estimated 

subpopulation .mean with pro- 
portional sampling. 

k . ' = * ,' ' 

(All of the atove symbols.exceptjgg > 
were defined, in the pr'evious secliOh^) 
Equation 13 assumes simple random ^(n-' 
plihg of" subpopulation f^embers within 
each'bf the two stratd, d common variance 
within strata/ ahd a very small sapling 
rate within strata. The firs^ of ihese 
assumptions 1^ considerably /different from 
the NLS design, w^HTjch points out again that 
using* equation T3 permits Only rough .ap- 
proximatior»-*o the effad of oversampllng 
on ihe J^rfances of SAJrvey estimates. 

'Rie a^roximate effect of oversampllng 
on the variances of survey estimiates was 
calculated using equation 13 with K - 
2.100, V = 2:148, and-the Values of.lu for 
each subpopulation obtained from* the t^, 
estimates in table 4. Tabl^ 7 shows thi3 ds- 



W^k^erg also stiows that, with the As- 
sumptions, stated* earlier, the optimuni rate 
of oV^.sampling for estimated sobpopuld- 
tiqfi meafTs, \t ,^ . ; 



opt k = * y^j" 



(15) 



Table 7 shows ^the appro)(imfirte optimum k 
,for each subpopulation. The NLS design, 
'with k = 2.1CNCIV employed more than the 
optimum oversampfing rate 'for all subpop- 
' ul$tions shown^re except blacks, where 
hrgher degree of oversafnpling would have 
been optimal. For a nurnber of subpopulli- 
tions with u ^<:* 1, proportionaf sampling 
•was indicated. 

The 'Effect of' oyersanrVpUng on the vari-t 
' ances, as estimated here,' is only a part of 
the effect of unequal weighting. TNd as- 
i sumption of simple random sampling within 
the . two^trata' implies equal weighting 
within Jstrsita, whereas the NLS sample had - 
unequal wei'ghtis. Thei'^ncrease in '.variance 
duie 'to unequal weighting from factor^ other 
than oversampllng is discuss^ in th^ next 
section. 
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T«bl« 7.^ttlmat0d efftct «f Qvtrtampiing dn the varlaocM of survey MtlmatM aiHl 



optimum ovartampllng rata* for tubpopulattona^ , « 



, • ' ti . ■ / Optimum 

Subpopuiaftion u = --r— - a oWa 2 k 



Sex: ^ [ 

•^a'e . «• 0.989 ^ - ' I.t3' 0.99 

Female 1.024 *--V , 1.12 , 101 

NoWeported . * ,g26 , 1.14 ' ]qq 

Race: ■ . ' * ' , 

White ■ i . .697 1.18 83 

Black / 7.778 qq 2 79 

O^^'er 2.211 • .99 v 1.49" " 

Not reported ^ .978 "1.13 .99 

Father's^ education: . . . 

Less than high schooi graduate ^ ^ 1.415 i'.07 119 

High schpoi graduate ..859 -1.15 [93 . 

Coiiege graduate ,600 1.20 !77 

Not reported ^ 1.500 - ■ 1.0^. " ^^22 

Totgl^ .V. 1.000 ■ • 1.13 *• .1.00 • 



^3. Effect of . Unequal vi'eighting within 

the L9W SES and High SES Strata 1 

The e{*fect of unequal iveighting within' apprdximated by considering the estimated 
the low SES and high SES strata can be " totai.^X'," written as' - • 



V • - 



, 300 . "h ■ "hi 600 "h "hi • 

^ " -h^ i'^. ^ihij-^hij + I LL W^i, X^ij (16) 

+1 = 1 1 = 1 j.= 1 ' ' • h = 3p1 i = 1 1 = 1 ' • 



/ 

and its variance 



300"^ "h "hi „ 600 "h " 



hi 



Var(X ) = I Z E. v^.la.^ I I I „^ (17) 

h = 1 i = 1 j = 1 ^^'J h ^ h = 301 i = 1 J = 1 ^hij h 



wtjere ' ' 

' /w^ij = weight for studerit-hij; 

X^ij = value of varfable-X' for student- 
hij, 

(If,. = number of sample schools in 
stratum-h, and 



n^^j = number of- sample students in 



in s<ihool-l of stratum-h. 



If'the weights within the low SES stratum 
(strata 1-300) were all equal to and jf 
those wit|iiri the, high €ES stratum all 
equalled Wj, then we could rewrite equa- ^ 
tion 17 as . , , 



•>300 

Var (X' ) = I 

h = 1- 




\^^' i=i 



-600 

+ z 

h = 301 



^hi, : 



i=1 j=1 



V 



' Now we can approximate the increase in 
vai'iance due to .unequal weighting wilhin 
the high artd low^SES strata ai 



M = 



600 "h 
'h = 301 i = 1 



^hi 



, and 



= o2 



for all h. 



Var (X' 
' Varg (X 



where 

• # 



') 



^1 = 



i' i 



w 



hij 



+ n- 



300 
Z 
h = 1 



"h 

I 

i = 1 



(W2)2 




^hi 



Table 8 shows the. average weight values, 
the sum of the squared weights, and thj^ 
approximate increase in variance estimaled 
using equation 19. The estimated increase 
is fairly sizable for- aiy^M^populations and 
for^the totaf population. This portion of the 
unequal weighting arises from unequal final 
stratum sizes, imprecise size' measures,^and 
from wei'ght' adjustments to correct for noa- 
-response. The results in this section should 
be regarded as rough approximations since 
ass u pi pt ions" of equal variances within strata, 
and fixed subpppulation sizes are required. 
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Tabi« 8.— Estimated fffaet of untqual weighting within low and high SES strata on 
varlancM of turvoy Mtlmatot ' . . 



Average weight • Sum of squared "Estimated 



Subpopulatlon 


Low SES 
(Wi) 


High SES 


of weights 


effect of 
unequal 
^eightlftg 


Sex: 
Male 
Female 

Not reported * %' 


112.76 • 

111.22 

113.30 


216.17 
216.571 
205.51 


— r» 

287,468JB45 
284,457,979 
51 ,472;065 


1.16 
1.15 
c 1.21 , 


Race: ^ 
White * 
Black 
otner 

Not reported , . 


112.53 
109.15- 
114.00 
. 113.59 


216.36 
219.83 
213.58 
207.43 


479,92^,309 
40,290.221 . 
44,009.940 
59,169,419 


r 

1 

1.15 
1.20 . 
1.15 
1.21 


Father's education: 

Less than.high'school graduate 
' schodi graduate 

(^lege graduate 

Not reported 


• 

113.03i 
112.73 
■ 111. 34 .^j?"- 

ni.o^T 


218.45 
'2I8.6O 
213.37 
211.39 


143,103,131 
160,955,721 
193,398,932 
120,941,105 


• 

1.J4 
1.15 

1.15 . 
1,18 


Xotal sample 


112.08 


215.43 


623,398,888 


t.16 








w 

< 
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COMPARING THE STRATIFICATION"VARIABLes 



• 4 



* The variance modeling described ln sec- 
tion 11.8. oUthis report suggests that the 
NLS school watification variables reduced 
^ the vg-iances of national estimates by ap- 
££[^f^^^W20 percent compared with sam- 
^nPS^g cl.i/Sters of students selected from an 
^nstratified school frame. Variances of sub- 
population estimates were reduced by less- 
er amounts, from 6 to 20 peccent/ depend- 
ing on the subpopulation. In this 'section, 
analyses aimed at determining which strati- 
fication ^v^riables accounted for most- of the 
- reductibn in variance are described. 

The analysis^ involved calculating several 
sets of variance component estimates for a 
linear Variance n>odel which includes terms 
for the ^five mdjor stratification variables. 
By extending the linear model given in sec- 
tion IV of reference [1], variance com- 
ponents corresponding to the following 
stratification variables were estimated— SES 
(socioeconomic status), size of school, type 
of control (public, Catholic, non-Catholic 
private), geographic region, and'proximity 
to .college or university. When the sampling' 
frame was stratified, crossing of the first 
four of these variables divided the popula- 
tfon-of schools into 35 strata. Then the fifth 
stratification variable,- proximity, was used 
to subdivide certain of the 35 strata; this 
resulted in 64 strata based upon these five 
variable! Next, a total of 289 major strata 
were defined by corrStructing nested sub- 
strata wjthin jhe 64 strata mentioned above 
based on percent minority (public schools) 
and average income level (public and Cath- 
olic schools). Final strata >^$re defined as 
nested substrata within majdV strata, bas6d 
.on degree of urbanization. For the purposes 
of this analysis, only the five major stratifi- 
fetion * variables described above were 
studied. 



/A dijficulty was encountered which re- 
lies to the order in which the stratification 
variables are placed in the model. Since the 
ifive major stratification variables may be 
regarded as crossed, the model could be 
specified using any one of 5! = 120 models 
corresponding to the 120 possible arrange- 
ments of the five variables. Also, the earlier 
Jn tfie model a varTable is placed^ the more 
negative estimates (s^ eguah to zero^ will 
be calculated since tg(B opponents are es- 
timated from right td left in the model 
(component for the* last term of the model is 
estimated first). VVith eight copiponents to 
be estifnated (five stratification effects plus 
final stratum, school, and student com- 
ponents), the number of negative estimates 
obtained was expected t6 be sizable. Thus, 
it was not clear h^Dw to proceed anS com- 
puting a set of components for Bach of the 
120 models was not considered feasible. 

As a first step toward gaining some feel 
for the. relative utility of the five stratifica- 
tion variables, five models were specified 
and five variance components funs were 
completed. The models'were chosen so that 
each of the five variables was first in one 
model and fifth in another model.- A subset ^ 
©f 10 of the 21 variables us^d in the previ- 
ous variance components study [1] was cho- 
sen for this part of the analysis. Also, only 
^ four subpopulation estimates were Incfuded 
^males, females, whites, and blacks). Thus 
90 statistics were included in each of the 5 
variance components runs, 10 natfonal esti- 
mates, 40 ddmain estfmates, and 40 dif- 
ferences of domain and national estKnates. ' 
The' analysis consisted *of comparing .the 
numb'fer of negative variance component es- 
•"timates for the five stratification variables 
when the variable was first in *the model 
and also fifth in the model. If the eHect of 
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one of tlie stratification||variables was'fero, 
then we should observe about 50 percent of 
the estimates for that variance component 

'*{o^be negative. Table 9 shows the number 
of negative estimates obtained for each of 
the five variables by type of statistic. When ' 
one ofi^the variables is written fifth in the 
model, estimates of the component are least 
biased by the large. number of terms in the ^ 
model. LookinV^t the lower part of table 9, 
we nofe fhat all five of the stratification 
variables have positive effects. (Using a 
simple sign test based on the numbers of 
positive and negative estimates, the hypo- 
thesis of zero effect .would be rejected for 
each variable for national means, domain 
means, differences of domain and national 
means, and all statistics.) Looking at {he 
upper, h^lf* of table 9,. we see the effects of 
position in the model on the numbers of 
negative variance component estimates. 
Using this data, we would reject the hypo- . 

• thesis of efrkct equal to zero only for control 



ahd region, based on a sign test. But ^nce 
we know the number of negative estimates 
will be biased upward due to \\\e large 
number of terms estimated, we.c'anr^ot con- 
clude anything from this type of te^t. VVe 
must also kepp in mind that we havfe used 
only five of tfje 120 possible arrangeVnbrity 
of the model ahd that the results her^ npay 
depend on the model used. \ ^ ^ 

About all we can conclude from table 9^ is 
that regfon appears perhaps the stronge^st 
stratification variable, that control js pey 
haps the weakest, and that the other thre 
variables are somewhere in between, Thfert 
were also indications that the numbers o\ 
negative cfomponent estimates for several of 
the variables were sensitive to ttie .position! 
in the^ model of the control variable. Thfs 
wa$ ttiought td arise from the extreme larg§ 
differenc'es In the. population and sample 
'Sizes for thip three levels of control— stu-* 
dents enroll€(d In public, Catholic, and 
non-Oathol c private schools. 



Table S.^hfumber of negative variance componenrestimates for stratification terms in first and fifth 
positions in model 



Vanabte and 
position 
in model 



National means 



Domain means 



Diff 
^om^ 
national 



erer>ce of 
n and 
means 



Negative 
estimajiee 



Total 



Negative 
estimates 



Total 



Negative 
estimates 



All 
statistics 



Total 



Negative 
estimates 



S 



Total 



First positten 



SES 


2 


10 


17 


40 


25 


40 ' 


44 


90 


Size 


1 


10 


13 


40 


25 


40 


39 


90 


Control 


5 


10 


29 


40 


34 


40 




90 


Region 


0 


10 


5 


40 


13 


40 


18 


90 


Proximity 


2 


10 


V 


40 


24 • 


40 ' 


43 


90 



Fifth-positjon 



SES 


0 


. 10 


2 


40 


1 


40 


3 


Size 


1 


10 


5 


1 40 


8 


40 


14 


Control 


0 


10 


9 


40 


12 


40 


21 


Region 


0 


10 


0 - 


40 




40 


4 


Proximity 


1 


10 


4 


40 


3 


40 


8 




— t — — 
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For the aforementioned reasons-, it was 
decided to eliminate control froTn the model 
and enter region in the model at first posi- 
tion. Then to evaluate the*relative impor- 
tance Qf the remaining three variables, the 
three were permuted in all 3! = 6 possible 
orders and six additional variance com- 
ponent runs were made using the same 10 
variables and the same four domains as 
used in the previous runs. The orderings of 
the stratification variabtes for -the six vari- 
ance cdtnponent runs were: 

Region— SES— Size— Proximity. 
Region— seS^roxi mily— Size, 
Region— Size— SES—Proximify, ^ 
Region— Size— Proximity— SES, I 
Region— Proximity— SES— Size, and 
Region— Proximity— Size— SES. 

The number of negative component esti- 
mates was observed for each of the three 
stratification* variables m positions two, 
three, and four These counts are shown in 



J 



table JO. A sign test would result in rejec- 
' tion of the hypothesis of a zero effect for 
each variable in each position. Thus, we 
can conclutle that each of these variables 
was effective in reducing the variances of* 
estimates. If we use the number of negative 
variance component- estimates as a criterion 
describing the magnitude of the effects, 
then we might conclude that the five strati- 
fication- variables might be ranked from 
most useful to l^assjt/ useful as region, SES, 
proximity, size, and control. Thus, whife we 
hav^ not been able to precisely estimate 
how much of the stratification effect to at- 
tribute to each of the variables, we have 
some rough indications of the relative im- 
portance of the five major stratification var- 
iables. We also have an indication that con- 
trol may not have been a very useful strati- 
fication variable, but that region, ,SES, size, 
and proximity were all useful stratification 
variables. 



Table 10.— Number of negative variance component estimates for terms In second, tfilrd, and fourlfi 
positions In model 



Difference of 
domain and 



All 



Variable and 


National means 


Domain means 


national means 


statistics 


' position Negative 
in model estimates 


Total 


Negative 
estimates 


Total 


Negative 
estiimates 


..Total 


Negative 
estimates 


4 

Total 


^ Second position 


















SES 


. 0 


20 


14 


80 


25 


80 


39 


180 


Size 


2 


2^- 


- 15 


80 


28 


80 


45 


180 


Proximrty 


0 


20 


19 


80 


28 


80" 


\47 


180 


Third position. 


















SES 


6 


. 20 


^ 6 


80 


7 


80 


13 


180 


' Size 


0 


20 


6 


80 


17 


. 80 


-23 


180 


Proximity 


0 


20 


6 


80 


12 


80 


18 


180 


Fourth position. 


















SES 


0 


20 


3 


80 * 


2 


80 


5 . 


180 


Size 


0 


20 


6 


80 


10 


80 


16 


180 


Proicinciity 


2 


20 


6 


80 


4 


80 


12 


180 



ERIC 



21 



15 ; 
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